Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(filename): handle files beginning with a period or without extensions #185

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cjpearson
Copy link

@cjpearson cjpearson commented Sep 27, 2024

(#184)

This change resolves the issue reported with files containing no extension. It also adds tests for filenames such as .gitignore or .env which begin with a period.

@@ -57,10 +57,20 @@ export function resolveAlias(path: string, aliases: Record<string, string>) {
return _path;
}

const FILENAME_RE = /(^|[/\\])([^/\\]+?)(?=(\.[^.]+)?$)/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can improve regex instead to keep performance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it could be two checks, still keeping finalname logic independent)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just ran some quick tests and it does seem slower. The bulk of the time comes from basename so maybe that can be swapped out.

This implementation seems to have a slight edge on the regex implementation when running on node 20.

export function filename(path: string) {
  const base = path.split(/[/\\]/).pop();

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems even better. what is edge case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean the results are very close. This approach is faster for 10 of the test strings and slower for 6 of them, but the difference is typically < 15%.

Here's the quick test I ran:

import { group, bench, run } from "mitata";
import { basename } from "pathe";

const paths = Object.keys({
  // POSIX
  "test.html": "test",
  "/temp/myfile.html": "myfile",
  "./myfile.html": "myfile",
  "/Users/john.doe/foo/myFile.js": "myFile",
  "/Users/john.doe/foo/myFile": "myFile",
  "./.hidden/myFile.ts": "myFile",
  "./.hidden/myFile": "myFile",
  "/temp/.gitignore": ".gitignore",

  // Windows
  "C:\\temp\\": undefined,
  "C:\\temp\\myfile.html": "myfile",
  "\\temp\\myfile.html": "myfile",
  ".\\myfile.html": "myfile",
  ".\\john.doe\\myfile.js": "myfile",
  ".\\john.doe\\myfile": "myfile",
  ".\\.hidden\\myfile.js": "myfile",
  ".\\.hidden\\myfile": "myfile",
  "C:\\temp\\.gitignore": ".gitignore",
})

const FILENAME_RE = /(^|[/\\])([^/\\]+?)(?=(\.[^.]+)?$)/;
const regex = function filename(path) {
  return path.match(FILENAME_RE)?.[2];
}

const noRegex = function filename(path) {
  const base = basename(path);

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const mixed = function filename(path) {
  const base = path.split(/[/\\]/).pop()
  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const SLASH_RE=/[/\\]/
const lastIndex = function filename(path) {
  let slashIndex = -1;
  let len = path.length
  for (let i = len - 1; i >= 0; i--) {
    if (SLASH_RE.test(path.charAt(i))) {
      slashIndex = i;
      break;
    }
  }

  const base = slashIndex === -1 ? path : path.slice(slashIndex)

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

paths.forEach((fixture) => {
  group(fixture, () => {
    Object.entries({
      regex,
      noRegex,
      basename,
      mixed,
      lastIndex
    }).forEach(([name, fn]) => {
      bench(name + ": " + fixture, () => {
        fn(fixture);
      });
    });
  });
});

run();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, bun seems to do much better with the iteration approach in lastIndex.

Copy link
Member

@pi0 pi0 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might make it faster but avoid split/pop and instead look(backward) for the last segment separator in a single for of loop. In the same iteration, you can spot the first(last) dot position. (two chars with one stone haha)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That does seem to help, but it's still behind the other approaches. I added one more approach to use basename, but with the path splitting done with a regex instead of normalizeWindowsPath(path).split('/').

It performs nearly identically to the mixed function, but if the regex splitting can be used in the main basename function I think that would be my preference as you automatically limit possible inconsistencies between filename and basename. (For example, fixes like this would apply to both #180)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current slate of options:

import { group, bench, run } from "mitata";
import { basename } from "pathe";

const paths = Object.keys({
  // POSIX
  "test.html": "test",
  "/temp/myfile.html": "myfile",
  "./myfile.html": "myfile",
  "/Users/john.doe/foo/myFile.js": "myFile",
  "/Users/john.doe/foo/myFile": "myFile",
  "./.hidden/myFile.ts": "myFile",
  "./.hidden/myFile": "myFile",
  "/temp/.gitignore": ".gitignore",

  // Windows
  "C:\\temp\\": undefined,
  "C:\\temp\\myfile.html": "myfile",
  "\\temp\\myfile.html": "myfile",
  ".\\myfile.html": "myfile",
  ".\\john.doe\\myfile.js": "myfile",
  ".\\john.doe\\myfile": "myfile",
  ".\\.hidden\\myfile.js": "myfile",
  ".\\.hidden\\myfile": "myfile",
  "C:\\temp\\.gitignore": ".gitignore",
})

const FILENAME_RE = /(^|[/\\])([^/\\]+?)(?=(\.[^.]+)?$)/;
const regex = function filename(path) {
  return path.match(FILENAME_RE)?.[2];
}

const noRegex = function filename(path) {
  const base = basename(path);

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const SLASH_RE = /[/\\]/
const mixed = function filename(path) {
  const base = path.split(SLASH_RE).pop()
  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const lastIndex = function filename(path) {
  let slashIndex = -1;
  let len = path.length
  for (let i = len - 1; i >= 0; i--) {
    if (SLASH_RE.test(path.charAt(i))) {
      slashIndex = i;
      break;
    }
  }

  const base = slashIndex === -1 ? path : path.slice(slashIndex)

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const lastIndexOneLoop = function filename(path) {
  let slashIndex = -1;
  let dotIndex = -1;
  let len = path.length
  for (let i = len - 1; i >= 0; i--) {
    const char = path.charAt(i);
    if (SLASH_RE.test(char)) {
      slashIndex = i;
      break;
    }
    if (dotIndex === -1 && char === '.') {
      dotIndex = i
    }
  }

  const base = slashIndex === -1 ? path : path.slice(slashIndex)

  if (!base) {
    return undefined;
  }

  if (dotIndex <= 0) {
    return base;
  }

  return base.slice(0, dotIndex);
}

// Based on https://github.com/unjs/pathe/pull/180/files
// but using a regex to split the path
const basenameRegex = function (p, extension) {
  const segments = p.split(SLASH_RE);

  // default to empty string
  let lastSegment = "";
  for (let i = segments.length - 1; i >= 0; i--) {
      const val = segments[i];
      if (val) {
      lastSegment = val;
      break;
      }
  }

  return extension && lastSegment.endsWith(extension)
      ? lastSegment.slice(0, -extension.length)
      : lastSegment;
};

const withAlternativeBasename = function filename(path) {
  const base = basenameRegex(path)
  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

paths.forEach((fixture) => {
  group(fixture, () => {
    Object.entries({
      regex,
      noRegex,
      // basename,
      mixed,
      lastIndex,
      lastIndexOneLoop,
      withAlternativeBasename
    }).forEach(([name, fn]) => {
      bench(name + ": " + fixture, () => {
        fn(fixture);
      });
    });
  });
});

run();

@cjpearson cjpearson force-pushed the issue-184 branch 2 times, most recently from 50fec47 to a576974 Compare September 27, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants